import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import math as mt
"fivethirtyeight")
plt.style.use(from statsmodels.graphics.tsaplots import plot_acf
Although we have demonstrated the convenience of PyMC3, which encapsulated the MCMC in the relatively isolated process, we still need to investigate the basic structure of MCMC.
In this chapter, we will demonstrate the inner mechanism of MCMC with the help of previous analytical examples. There will be no PyMC3 used in this Chapter.
Markov Chain Monte Carlo
The Markov Chain Monte Carlo (MCMC) is a class of algorithm to simulate a distribution that has no closed-form expression.
To illustrate the mechanism of MCMC, we resort to the example of Gamma-Poisson conjugate.
Though it has a closed-form expression of posterior, we can still simulate the posterior for demonstrative purpose.
To use MCMC, commonly the Bayes’ Theorem is modified without affecting the final result. \[ P(\lambda \mid y) \propto P(y \mid \lambda) P(\lambda) \] where \(\propto\) means proportional to, the integration in the denominator can be safely omitted since it is a constant.
Here we recap the example of hurricanes in the last chapter. The prior elicitation uses
\[ E(\lambda) = \frac{\alpha}{\beta}\\ \text{Var}(\lambda) = \frac{\alpha}{\beta^2} \]
= np.linspace(0, 10, 100)
x = [10, 2]
params = sp.stats.gamma.pdf(x, a=params[0], scale=1 / params[1])
gamma_pdf
= plt.subplots(figsize=(7, 7))
fig, ax
ax.plot(=3, label=r"$\alpha = %.1f, \beta = %.1f$" % (params[0], params[1])
x, gamma_pdf, lw
)"Prior")
ax.set_title(= params[0] / params[1]
mean = (params[0] - 1) / params[1]
mode ="tomato", ls="--", label="mean: {}".format(mean))
ax.axvline(mean, color="red", ls="--", label="mode: {}".format(mode))
ax.axvline(mode, color
ax.legend() plt.show()
Because posterior will also be Gamma distribution, we start from proposing a value drawn from posterior \[ \lambda = 8 \] This is an arbitrary value, which is called the initial value.
Calculate the likelihood of observing \(k=3\) hurricanes given \(\lambda=8\). \[ \mathcal{L}(3 ; 8)=\frac{\lambda^{k} e^{-\lambda}}{k !}=\frac{8^{3} e^{-8}}{3 !}=0.1075 \]
def pois_lh(k, lamda):
= lamda**k * np.exp(-lamda) / mt.factorial(k)
lh return lh
= 8
lamda_init = 3
k =k, lamda=lamda_init) pois_lh(k
0.02862614424768101
- Calculate prior \[ g(\lambda ; \alpha, \beta)=\frac{\beta^{\alpha} \lambda^{\alpha-1} e^{-\beta \lambda}}{\Gamma(\alpha)} \]
def gamma_prior(alpha, beta, lamda):
= (
prior **alpha * lamda ** (alpha - 1) * np.exp(-beta * lamda)
beta/ sp.special.gamma(alpha)
) return prior
= lamda_init
lamda_current = 10
alpha = 2
beta =alpha, beta=beta, lamda=lamda_current) gamma_prior(alpha
0.0426221247856141
- Calculate the posterior with the first guess \(\lambda=8\) and we denote it as \(\lambda_{current}\)
= 3
k = pois_lh(k=k, lamda=lamda_current) * gamma_prior(
posterior_current =10, beta=2, lamda=lamda_current
alpha
) posterior_current
0.0012201070922556493
- Draw a second value \(\lambda_{proposed}\) from a proposal distribution with \(\mu=\lambda_{current}\) and \(\sigma = .5\). The \(\sigma\) here is called tuning parameter, which will be clearer in following demonstrations.
= 0.5
tuning_param = sp.stats.norm(loc=lamda_current, scale=tuning_param).rvs()
lamda_prop lamda_prop
6.903049306071834
- Calculate posterior based on the \(\lambda_{proposed}\).
= gamma_prior(alpha, beta, lamda=lamda_prop) * pois_lh(
posterior_prop =lamda_prop
k, lamda
) posterior_prop
0.005584813985229831
- Now we have two posteriors. To proceed, we need to make some rules to throw one away. Here we introduce the Metropolis Algorithm. The probability threshold for accepting \(\lambda_{proposed}\) is \[ P_{\text {accept }}=\min \left(\frac{P\left(\lambda_{\text {proposed }} \mid \text { data }\right)}{P\left(\lambda_{\text {current }} \mid \text { data }\right)}, 1\right) \]
print(posterior_current)
print(posterior_prop)
0.0012201070922556493
0.005584813985229831
= np.min([posterior_prop / posterior_current, 1])
prob_threshold prob_threshold
1.0
It means the probability of accepting \(\lambda_{proposed}\) is \(1\). What if the smaller value is \(\frac{\text{posterior proposed}}{\text{posterior current}}\), let’s say \(\text{prob_threshold}=.768\). The algorithm requires a draw from a uniform distribution, if the draw is smaller than \(.768\), go for \(\lambda_{proposed}\) if larger then stay with \(\lambda_{current}\).
- The demonstrative algorithm will be
if sp.stats.uniform.rvs() > 0.768:
print("stay with current lambda")
else:
print("accept next lambda")
accept next lambda
- If we accept \(\lambda_{proposed}\), redenote it as \(\lambda_{current}\), then repeat from step \(2\) for thousands of times.
Combine All Steps
We will join all the steps in a loop for thousands of times (the number of repetition depends on your time constraint and your computer’s capacity).
def gamma_poisson_mcmc(
=2, k=3, alpha=10, beta=2, tuning_param=1, chain_size=10000
lamda_init
):123)
np.random.seed(= lamda_init
lamda_current
= []
lamda_mcmc = []
pass_rate = []
post_ratio_list
for i in range(chain_size):
= pois_lh(k=k, lamda=lamda_current)
lh_current = gamma_prior(alpha=alpha, beta=beta, lamda=lamda_current)
prior_current = lh_current * prior_current
posterior_current
= sp.stats.norm(loc=lamda_current, scale=tuning_param).rvs()
lamda_proposal = gamma_prior(alpha=alpha, beta=beta, lamda=lamda_proposal)
prior_next = pois_lh(k, lamda=lamda_proposal)
lh_next
= lh_next * prior_next
posterior_proposal = posterior_proposal / posterior_current
post_ratio
= np.min([post_ratio, 1])
prob_next = sp.stats.uniform.rvs()
unif_draw
post_ratio_list.append(post_ratio)
if unif_draw < prob_next:
= lamda_proposal
lamda_current
lamda_mcmc.append(lamda_current)"Y")
pass_rate.append(else:
lamda_mcmc.append(lamda_current)"N")
pass_rate.append(return lamda_mcmc, pass_rate
The proposal distribution must be symmetrical otherwise the Markov chain won’t reach an equilibrium distribution. Also the tuning parameter should be set at a value which maintains \(30\%\sim50\%\) acceptance.
= gamma_poisson_mcmc(chain_size=10000) lamda_mcmc, pass_rate
= ["Pass", "Not Pass"]
yes = [pass_rate.count("Y"), pass_rate.count("N")]
counts
= np.linspace(0, 10, 100)
x = [10, 2]
params_prior = sp.stats.gamma.pdf(x, a=params_prior[0], scale=1 / params_prior[1]) gamma_pdf_prior
We assume \(1\) year, the data records in total \(3\) hurricanes. Obtain the analytical posterior, therefore we can compare the simulation and the analytical distribution. \[\begin{align} \alpha_{\text {posterior }}&=\alpha_{0}+\sum_{i=1}^{n} x_{i} = 10+3=13\\ \beta_{\text {posterior }}&=\beta_{0}+n = 2+1=3 \end{align}\] Prepare the analytical Gamma distribution
= [13, 3]
params_post = sp.stats.gamma.pdf(x, a=params_post[0], scale=1 / params_post[1]) gamma_pdf_post
Because initial sampling might not converge to the equilibrium, so the first \(1/10\) values of the Markov chain can be safely dropped. This \(1/10\) period is termed as burn-in period. Also in order to minimize the autocorrelation issue, we can perform pruning process to drop every other (or even five) observation(s).
That is why we use lamda_mcmc[1000::2]
in codes below.
= plt.subplots(figsize=(12, 12), nrows=3, ncols=1)
fig, ax 0].hist(lamda_mcmc[1000::2], bins=100, density=True)
ax[0].set_title(r"Posterior Frequency Distribution of $\lambda$")
ax[0].plot(x, gamma_pdf_prior, label="Prior")
ax[0].plot(x, gamma_pdf_post, label="Posterior")
ax[0].legend()
ax[1].plot(np.arange(len(lamda_mcmc)), lamda_mcmc, lw=1)
ax[1].set_title("Trace")
ax[2].barh(yes, counts, color=["green", "blue"], alpha=0.7)
ax[ plt.show()
Diagnostics of MCMC
The demonstration above has been fine-tuned deliberately to circumvent potential errors, which are common in MCMC algorithm designing. We will demonstrate how it happens, what probable remedies we might possess.
Invalid Proposal
Following the Gamma-Poisson example, if we have a proposal distribution with \(\mu=1\), the random draw from this proposal distribution might be a negative number, however the posterior is a Gamma distribution which only resides in positive domain.
The remedy of this type of invalid proposal is straightforward, multiply \(-1\) onto all negative draws.
= np.linspace(-3, 12, 100)
x_gamma = np.linspace(-3, 6, 100)
x_norm = [10, 2]
params_gamma = sp.stats.gamma.pdf(x, a=params_gamma[0], scale=1 / params_gamma[1])
gamma_pdf
= 1
mu = 1
sigma = sp.stats.norm.pdf(x, loc=mu, scale=sigma)
normal_pdf
= plt.subplots(figsize=(14, 7))
fig, ax
ax.plot(
x,
gamma_pdf,=3,
lw=r"Prior $\alpha = %.1f, \beta = %.1f$" % (params[0], params[1]),
label="#FF6B1A",
color
)
ax.plot(
x,
normal_pdf,=3,
lw=r"Proposal $\mu=%.1f , \sigma= %.1f$" % (mu, sigma),
label="#662400",
color
)4, 0.27, "Gamma Prior", color="#FF6B1A")
ax.text(1.7, 0.37, "Normal Proposal", color="#662400")
ax.text(0.2, -0.04, r"$\lambda_{current}=1$", color="tomato")
ax.text(
= np.linspace(-3, 0, 30)
x_fill = sp.stats.norm.pdf(x_fill, loc=mu, scale=sigma)
y_fill ="#B33F00")
ax.fill_between(x_fill, y_fill, color
="red", ls="--", label=r"$\mu=${}".format(mu), alpha=0.4)
ax.axvline(mu, color
ax.legend() plt.show()
Two lines of codes will solve this issue
if lamda_proposal < 0:
lamda_proposal *= -1
def gamma_poisson_mcmc_1(
=2, k=3, alpha=10, beta=2, tuning_param=1, chain_size=10000
lamda_init
):123)
np.random.seed(= lamda_init
lamda_current
= []
lamda_mcmc = []
pass_rate = []
post_ratio_list
for i in range(chain_size):
= pois_lh(k=k, lamda=lamda_current)
lh_current = gamma_prior(alpha=alpha, beta=beta, lamda=lamda_current)
prior_current = lh_current * prior_current
posterior_current
= sp.stats.norm(loc=lamda_current, scale=tuning_param).rvs()
lamda_proposal if lamda_proposal < 0:
*= -1
lamda_proposal = gamma_prior(alpha=alpha, beta=beta, lamda=lamda_proposal)
prior_next = pois_lh(k, lamda=lamda_proposal)
lh_next
= lh_next * prior_next
posterior_proposal = posterior_proposal / posterior_current
post_ratio
= np.min([post_ratio, 1])
prob_next = sp.stats.uniform.rvs()
unif_draw
post_ratio_list.append(post_ratio)
if unif_draw < prob_next:
= lamda_proposal
lamda_current
lamda_mcmc.append(lamda_current)"Y")
pass_rate.append(else:
lamda_mcmc.append(lamda_current)"N")
pass_rate.append(return lamda_mcmc, pass_rate
This time we can set chain size much larger.
= gamma_poisson_mcmc_1(chain_size=100000, tuning_param=1) lamda_mcmc, pass_rate
As you can see the frequency distribution is also much smoother.
= pass_rate.count("Y") / len(pass_rate)
y_rate = pass_rate.count("N") / len(pass_rate)
n_rate = ["Pass", "Not Pass"]
yes = [pass_rate.count("Y"), pass_rate.count("N")]
counts
= plt.subplots(figsize=(12, 12), nrows=3, ncols=1)
fig, ax 0].hist(lamda_mcmc[int(len(lamda_mcmc) / 10) :: 2], bins=100, density=True)
ax[0].set_title(r"Posterior Frequency Distribution of $\lambda$")
ax[0].plot(x, gamma_pdf_prior, label="Prior")
ax[0].plot(x, gamma_pdf_post, label="Posterior")
ax[0].legend()
ax[1].plot(np.arange(len(lamda_mcmc)), lamda_mcmc, lw=1)
ax[1].set_title("Trace")
ax[
2].barh(yes, counts, color=["green", "blue"], alpha=0.7)
ax[2].text(
ax[1] * 0.4,
counts["Not Pass",
r"${}\%$".format(np.round(n_rate * 100, 2)),
="tomato",
color=28,
size
)2].text(
ax[0] * 0.4,
counts["Pass",
r"${}\%$".format(np.round(y_rate * 100, 2)),
="tomato",
color=28,
size
) plt.show()
Numerical Overflow
If prior and likelihood are extremely close to \(0\), the product would be even closer to \(0\). This would cause storage error in computer due to the binary system.
The remedy is to use the log version of Bayes’ Theorem, i.e.
\[ \ln{P(\lambda \mid y)} \propto \ln{P(y \mid \lambda)}+ \ln{P(\lambda)} \]
Also the acceptance rule can be converted into log version
\[ \ln{ \left(\frac{P\left(\lambda_{proposed } \mid y \right)}{P\left(\lambda_{current} \mid y \right)}\right)} =\ln{P\left(\lambda_{proposed } \mid y \right)} - \ln{P\left(\lambda_{current } \mid y \right)} \]
def gamma_poisson_mcmc_2(
=2, k=3, alpha=10, beta=2, tuning_param=1, chain_size=10000
lamda_init
):123)
np.random.seed(= lamda_init
lamda_current
= []
lamda_mcmc = []
pass_rate = []
post_ratio_list
for i in range(chain_size):
= np.log(pois_lh(k=k, lamda=lamda_current))
log_lh_current = np.log(
log_prior_current =alpha, beta=beta, lamda=lamda_current)
gamma_prior(alpha
)= log_lh_current + log_prior_current
log_posterior_current
= sp.stats.norm(loc=lamda_current, scale=tuning_param).rvs()
lamda_proposal if lamda_proposal < 0:
*= -1
lamda_proposal = np.log(
log_prior_next =alpha, beta=beta, lamda=lamda_proposal)
gamma_prior(alpha
)= np.log(pois_lh(k, lamda=lamda_proposal))
log_lh_next
= log_lh_next + log_prior_next
log_posterior_proposal = log_posterior_proposal - log_posterior_current
log_post_ratio = np.exp(log_post_ratio)
post_ratio
= np.min([post_ratio, 1])
prob_next = sp.stats.uniform.rvs()
unif_draw
post_ratio_list.append(post_ratio)
if unif_draw < prob_next:
= lamda_proposal
lamda_current
lamda_mcmc.append(lamda_current)"Y")
pass_rate.append(else:
lamda_mcmc.append(lamda_current)"N")
pass_rate.append(return lamda_mcmc, pass_rate
With the use of log posterior and acceptance rule, the numerical overflow is unlikely to happen anymore, which means we can set a much longer Markov chain and also a larger tuning parameter.
= gamma_poisson_mcmc_2(chain_size=1000, tuning_param=3) lamda_mcmc, pass_rate
= pass_rate.count("Y") / len(pass_rate)
y_rate = pass_rate.count("N") / len(pass_rate)
n_rate = ["Pass", "Not Pass"]
yes = [pass_rate.count("Y"), pass_rate.count("N")]
counts
= plt.subplots(figsize=(12, 12), nrows=3, ncols=1)
fig, ax 0].hist(lamda_mcmc[int(len(lamda_mcmc) / 10) :: 2], bins=100, density=True)
ax[0].set_title(r"Posterior Frequency Distribution of $\lambda$")
ax[0].plot(x, gamma_pdf_prior, label="Prior")
ax[0].plot(x, gamma_pdf_post, label="Posterior")
ax[0].legend()
ax[1].plot(np.arange(len(lamda_mcmc)), lamda_mcmc, lw=1)
ax[1].set_title("Trace")
ax[
2].barh(yes, counts, color=["green", "blue"], alpha=0.7)
ax[2].text(
ax[1] * 0.4,
counts["Not Pass",
r"${}\%$".format(np.round(n_rate * 100, 2)),
="tomato",
color=28,
size
)2].text(
ax[0] * 0.4,
counts["Pass",
r"${}\%$".format(np.round(y_rate * 100, 2)),
="tomato",
color=28,
size
) plt.show()
Larger tuning parameter yields a lower pass (\(30\%\sim50\%\)) rate that is exactly what we are seeking for.